5 research outputs found

    Repetition Detection in a Dynamic String

    Get PDF
    A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n^o(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings. We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting

    Sufficient Conditions for Efficient Indexing Under Different Matchings

    Get PDF
    The most important task derived from the massive digital data accumulation in the world, is efficient access to this data, hence the importance of indexing. In the last decade, many different types of matching relations were defined, each requiring an efficient indexing scheme. Cole and Hariharan in a ground breaking paper [Cole and Hariharan, SIAM J. Comput., 33(1):26-42, 2003], formulate sufficient conditions for building an efficient indexing for quasi-suffix collections, collections that behave as suffixes. It was shown that known matchings, including parameterized, 2-D array and order preserving matchings, fit their indexing settings. In this paper, we formulate more basic sufficient conditions based on the order relation derived from the matching relation itself, our conditions are more general than the previously known conditions

    The k-Mappability Problem Revisited

    Get PDF
    The k-mappability problem has two integers parameters m and k. For every subword of size m in a text S, we wish to report the number of indices in S in which the word occurs with at most k mismatches. The problem was lately tackled by Alzamel et al. [Mai Alzamel et al., 2018]. For a text with constant alphabet ? and k ? O(1), they present an algorithm with linear space and O(nlog^{k+1}n) time. For the case in which k = 1 and a constant size alphabet, a faster algorithm with linear space and O(nlog(n)log log(n)) time was presented in [Mai Alzamel et al., 2020]. In this work, we enhance the techniques of [Mai Alzamel et al., 2020] to obtain an algorithm with linear space and O(n log(n)) time for k = 1. Our algorithm removes the constraint of the alphabet being of constant size. We also present linear algorithms for the case of k = 1, |?| ? O(1) and m = ?(?n)

    Analysis of the Period Recovery Error Bound

    Get PDF

    Shortest Beer Path Queries in Interval Graphs

    Get PDF
    Our interest is in paths between pairs of vertices that go through at least one of a subset of the vertices known as beer vertices. Such a path is called a beer path, and the beer distance between two vertices is the length of the shortest beer path. We show that we can represent unweighted interval graphs using 2n log n + O(n) + O(|B|log n) bits where |B| is the number of beer vertices. This data structure answers beer distance queries in O(log^? n) time for any constant ? > 0 and shortest beer path queries in O(log^? n + d) time, where d is the beer distance between the two nodes. We also show that proper interval graphs may be represented using 3n + o(n) bits to support beer distance queries in O(f(n)log n) time for any f(n) ? ?(1) and shortest beer path queries in O(d) time. All of these results also have time-space trade-offs. Lastly we show that the information theoretic lower bound for beer proper interval graphs is very close to the space of our structure, namely log(4+2?3)n - o(n) (or about 2.9 n) bits